---
title: "*In silico* analysis of - Combination of outputs according to Frenzel et al. 2017"
output: html_notebook
editor_options: 
  chunk_output_type: inline
---

#Libraries

```{r initial package load, message=TRUE, warning=FALSE, echo=TRUE}
library(tidyverse)
library(dbplyr)
library(scales)
library(matrixStats)
library(DT)
library(data.table)

```

The QSAR predictions were processed and combined as described in Frenzel et al. 2017

In the table below the data transformations are summarized

# Data preparation of mutagenic predictions for each analysis 


Tool  | Test   |Negative  |  Positive  |Notes
------|-----------------------------------------|-----------------|-----------------|-----------------------------------------
TEST  | *Mutagenicity Consensus* | 0 | 1 | Take output as is
LAZAR | *Mutagenicity (Salmonella typhimurium)* | mutagenic | non-mutagenic | Footnote 1  
VEGA  | *Mutagenicity consensus model*    | 0 | 1 | Take output as is
 

# Data preparation of carcinogenic predictions for each analysis

Tool  | Test   |Negative  |  Positive  |Notes
------|-----------------------------------------|-----------------|-----------------|-----------------------------------------
VEGA  |*Carcinogenicity model (CAESAR) - assessment*	|NON-Carcinogen |Carcinogen |Conversion as in Frenzel et al. 
VEGA  |*Carcinogenicity model (ISS) - assessment*	 |NON-Carcinogen |Carcinogen |Conversion as in Frenzel et al.
VEGA  |*Carcinogenicity model (IRFMN/Antares) - assessment*	|NON-Carcinogen |Carcinogen |Conversion as in Frenzel et al.
VEGA  |*Carcinogenicity model (IRFMN/ISSCAN-CGX) - assessment*|NON-Carcinogen |Carcinogen |Conversion as in Frenzel et al.
LAZAR |*Rodents multiple species* |carcinogenic |non-carcinogenic | Footnote 1
LAZAR |*Rat* |carcinogenic |non-carcinogenic | Footnote 1
LAZAR |*Mouse* |carcinogenic |non-carcinogenic | Footnote 1

Footnote 1: In LAZAR additional probability score of pos/neg prediction is provided which indicate the probabilities that the prediction belongs to one of the two classes. The approach described in Frenzel et al was adapted for use with this new output. In essence, the difference between the two probabilities was calculated and then re-scaled on a scale from 0-1. If a compound was in the training set, 1 and 0 were assigned for experimenally tested compounds which showed (1) or did not show (0) mutagenic (carcinogenic) activities. If no result was obtained the score was set to 0.5. 


Table 1. Frenzel et al, 2017

Prediction      | Reliability | Carcinogenoic score
----------------|-----------------------|------------------
Carcinogen      |Experimental activity  |1
                |Good reliability       |0.9
                |Moderate reliability   |0.7
                |Low reliability        |0.5
Non-carcinogen  |Low reliability        |0.5       
                |Moderate reliability   |0.3
                |Good reliability       |0.1
                |Experimental activity  |0

#Mutagenicity

Retrieve data required for this analysis.

##TEST

```{r read TEST Excel data}
test_muta_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - TEST muta out")
head(test_muta_out)

test_muta <- test_muta_out %>% 
  mutate(mutagen_test = rescale(.$Pred_Consensus, to=c(0,1)))
```

##LAZAR

```{r read LAZAR Excel mutagen data}
lazar_muta_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - LAZAR muta out")
```

```{r Extract mutagen data from LAZAR, message=TRUE, echo=TRUE}
lazar_muta <- lazar_muta_out %>% 
  mutate(mutagen_lazar = case_when(
     is.na(.$`Lazar Prediction`) ~ 0.5,
     .$Measurements == "mutagenic" ~ 1,
     .$Measurements == "non-mutagenic" ~ 0,
    .$`Lazar Prediction` == "mutagenic" ~ rescale( 
      .$`Lazar predProbability mutagenic` / .$`Lazar predProbability non-mutagenic`,to=c(0.5,1)),
    .$`Lazar Prediction` == "non-mutagenic" ~ rescale( 
      .$`Lazar predProbability non-mutagenic` / .$`Lazar predProbability mutagenic`, to=c(0.5,0))))
```


##VEGA

```{r read VEGA Excel mutagenicity data}
vega_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - VEGA out")
vega_out$ID <- as.numeric(gsub("#", "", vega_out$ID))
```


```{r Extract mutagen data from VEGA, message=TRUE, echo=TRUE}
eq_vega_mut <- vega_out %>% select("ID", starts_with("Muta")) %>% 
  select("ID", contains("CONSENSUS")) %>% 
  select("ID", contains("assessment")) %>% 
  separate("Mutagenicity (Ames test) CONSENSUS model - assessment", into =c ("CMP", "ADI"), sep="[(]Consensus score:") %>% 
  mutate(ADI = as.numeric(str_replace(ADI, "[)]", ""))) %>% 
  mutate(mutagen_vega = case_when(
    .$CMP == "Mutagenic " ~  rescale(.$ADI, to=c(0.5,1)),
    .$CMP == "NON-Mutagenic " ~ rescale(.$ADI, to=c(0.5, 0))))
```


#Carcinogenicity

##LAZAR


```{r read LAZAR Excel carcinogen data}
#Connection to SQlite
lazar_carc_rodent_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - LAZAR carc out1", skip = 1) %>% type.convert()

lazar_carc_rat_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - LAZAR carc output2", skip = 2) %>% type.convert()

lazar_carc_mouse_out <- readxl::read_excel("./Compiled_tox_output.xlsx", sheet = "Tox - LAZAR carc output3", skip = 2) %>% type.convert()
```

```{r prepare LAZAR canc rodent, message=TRUE, echo=TRUE}
lazar_carc_rodent <- lazar_carc_rodent_out %>% 
  mutate(carci_lazar_rod = case_when(
     is.na(.$`Lazar Prediction`) ~ 0.5,
     .$Measurements == "carcinogenic" ~ 1,
     .$Measurements == "non-carcinogenic" ~ 0,
    .$`Lazar Prediction` == "carcinogenic" ~ rescale( 
      .$`Lazar predProbability carcinogenic` / .$`Lazar predProbability non-carcinogenic`,to=c(0.5,1)),
    .$`Lazar Prediction` == "non-carcinogenic" ~ rescale( 
      .$`Lazar predProbability non-carcinogenic` / .$`Lazar predProbability carcinogenic`, to=c(0.5,0))))
```

```{r prepare LAZAR canc rat, message=TRUE, echo=TRUE}
lazar_carc_rat <- lazar_carc_rat_out %>% 
  mutate(carci_lazar_rat = case_when(
     is.na(.$`Lazar Prediction`) ~ 0.5,
     .$Measurements == "carcinogenic" ~ 1,
     .$Measurements == "non-carcinogenic" ~ 0,
    .$`Lazar Prediction` == "carcinogenic" ~ rescale( 
      .$`Lazar predProbability carcinogenic` - .$`Lazar predProbability non-carcinogenic`,to=c(0.5,1)),
    .$`Lazar Prediction` == "non-carcinogenic" ~ rescale( 
      .$`Lazar predProbability non-carcinogenic` - .$`Lazar predProbability carcinogenic`, to=c(0.5,0))))
```

```{r prepare LAZAR canc mouse, message=TRUE, echo=TRUE}
lazar_carc_mouse <- lazar_carc_mouse_out %>% 
  mutate(carci_lazar_mus = case_when(
     is.na(.$`Lazar Prediction`) ~ 0.5,
     .$Measurements == "carcinogenic" ~ 1,
     .$Measurements == "non-carcinogenic" ~ 0,
    .$`Lazar Prediction` == "carcinogenic" ~ rescale( 
      .$`Lazar predProbability carcinogenic` - .$`Lazar predProbability non-carcinogenic`,to=c(0.5,1)),
    .$`Lazar Prediction` == "non-carcinogenic" ~ rescale( 
      .$`Lazar predProbability non-carcinogenic` - .$`Lazar predProbability carcinogenic`, to=c(0.5,0))))
```


##VEGA

```{r Extract carcinogen data from VEGA, message=TRUE, echo=TRUE}
vega_carc <- vega_out %>% select("ID", starts_with("Carci")) %>% 
  select("ID", contains("assessment")) 

vega_carc_ceasar <- vega_carc %>% select("ID", contains("CAESAR")) %>% 
  separate("Carcinogenicity model (CAESAR) - assessment", into =c ("CMP_CAESAR", "ADI_CAESAR"), sep="[(]") %>% 
  mutate(ADI_CAESAR = str_replace(ADI_CAESAR, "[)]", "")) %>% 
  mutate(vega_carci_caesar = case_when(
    (.$CMP_CAESAR == "Carcinogen " | .$CMP_CAESAR == "Possible Carcinogen ")  & ADI_CAESAR == "EXPERIMENTAL value" ~ 1.0,
    (.$CMP_CAESAR == "Carcinogen " | .$CMP_CAESAR == "Possible Carcinogen ")  & ADI_CAESAR == "good reliability" ~ 0.9,
    (.$CMP_CAESAR == "Carcinogen " | .$CMP_CAESAR == "Possible Carcinogen ")  & ADI_CAESAR == "moderate reliability" ~ 0.7,
    (.$CMP_CAESAR == "Carcinogen " | .$CMP_CAESAR == "Possible Carcinogen ")  & ADI_CAESAR == "low reliability" ~ 0.5,
    (.$CMP_CAESAR == "NON-Carcinogen "  | .$CMP_CAESAR == "Possible NON-Carcinogen ") & ADI_CAESAR == "low reliability" ~ 0.5,
    (.$CMP_CAESAR == "NON-Carcinogen "  | .$CMP_CAESAR == "Possible NON-Carcinogen ") & ADI_CAESAR == "moderate reliability" ~ 0.3,
    (.$CMP_CAESAR == "NON-Carcinogen "  | .$CMP_CAESAR == "Possible NON-Carcinogen ") & ADI_CAESAR == "good reliability" ~ 0.1,
    (.$CMP_CAESAR == "NON-Carcinogen "  | .$CMP_CAESAR == "Possible NON-Carcinogen ") & ADI_CAESAR == "EXPERIMENTAL value" ~ 0.0))

vega_carc_iss <- vega_carc %>% select("ID", contains("ISS")) %>% 
  separate("Carcinogenicity model (ISS) - assessment", into =c ("CMP_iss", "ADI_iss"), sep="[(]") %>% 
  mutate(ADI_iss = str_replace(ADI_iss, "[)]", "")) %>% 
  mutate(vega_carci_iss = case_when(
    (.$CMP_iss == "Carcinogen " | .$CMP_iss == "Possible Carcinogen ")  & ADI_iss == "EXPERIMENTAL value" ~ 1.0,
    (.$CMP_iss == "Carcinogen " | .$CMP_iss == "Possible Carcinogen ")  & ADI_iss == "good reliability" ~ 0.9,
    (.$CMP_iss == "Carcinogen " | .$CMP_iss == "Possible Carcinogen ")  & ADI_iss == "moderate reliability" ~ 0.7,
    (.$CMP_iss == "Carcinogen " | .$CMP_iss == "Possible Carcinogen ")  & ADI_iss == "low reliability" ~ 0.5,
    (.$CMP_iss == "NON-Carcinogen "  | .$CMP_iss == "Possible NON-Carcinogen ") & ADI_iss == "low reliability" ~ 0.5,
    (.$CMP_iss == "NON-Carcinogen "  | .$CMP_iss == "Possible NON-Carcinogen ") & ADI_iss == "moderate reliability" ~ 0.3,
    (.$CMP_iss == "NON-Carcinogen "  | .$CMP_iss == "Possible NON-Carcinogen ") & ADI_iss == "good reliability" ~ 0.1,
    (.$CMP_iss == "NON-Carcinogen "  | .$CMP_iss == "Possible NON-Carcinogen ") & ADI_iss == "EXPERIMENTAL value" ~ 0.0))


vega_carc_antares <- vega_carc %>% select("ID", contains("Antares")) %>% 
  separate("Carcinogenicity model (IRFMN/Antares) - assessment", into =c ("CMP_antares", "ADI_antares"), sep="[(]") %>% 
  mutate(ADI_antares = str_replace(ADI_antares, "[)]", "")) %>% 
  mutate(vega_carci_antares = case_when(
    (.$CMP_antares == "Carcinogen " | .$CMP_antares == "Possible Carcinogen ")  & ADI_antares == "EXPERIMENTAL value" ~ 1.0,
    (.$CMP_antares == "Carcinogen " | .$CMP_antares == "Possible Carcinogen ")  & ADI_antares == "good reliability" ~ 0.9,
    (.$CMP_antares == "Carcinogen " | .$CMP_antares == "Possible Carcinogen ")  & ADI_antares == "moderate reliability" ~ 0.7,
    (.$CMP_antares == "Carcinogen " | .$CMP_antares == "Possible Carcinogen ")  & ADI_antares == "low reliability" ~ 0.5,
    (.$CMP_antares == "NON-Carcinogen " | .$CMP_antares == "Possible NON-Carcinogen ")  & ADI_antares == "low reliability" ~ 0.5,
    (.$CMP_antares == "NON-Carcinogen " | .$CMP_antares == "Possible NON-Carcinogen ")  & ADI_antares == "moderate reliability" ~ 0.3,
    (.$CMP_antares == "NON-Carcinogen " | .$CMP_antares == "Possible NON-Carcinogen ")  & ADI_antares == "good reliability" ~ 0.1,
    (.$CMP_antares == "NON-Carcinogen " | .$CMP_antares == "Possible NON-Carcinogen ")  & ADI_antares == "EXPERIMENTAL value" ~ 0.0))

vega_carc_cgx <- vega_carc %>% select("ID", contains("IRFMN/ISSCAN-CGX")) %>% 
  separate("Carcinogenicity model (IRFMN/ISSCAN-CGX) - assessment", into =c ("CMP_cgx", "ADI_cgx"), sep="[(]") %>% 
  mutate(ADI_cgx = str_replace(ADI_cgx, "[)]", "")) %>% 
  mutate(vega_carci_cgx = case_when(
    (.$CMP_cgx == "Carcinogen " | .$CMP_cgx == "Possible Carcinogen ") & ADI_cgx == "EXPERIMENTAL value" ~ 1.0,
    (.$CMP_cgx == "Carcinogen " | .$CMP_cgx == "Possible Carcinogen ") & ADI_cgx == "good reliability" ~ 0.9,
    (.$CMP_cgx == "Carcinogen " | .$CMP_cgx == "Possible Carcinogen ") & ADI_cgx == "moderate reliability" ~ 0.7,
    (.$CMP_cgx == "Carcinogen " | .$CMP_cgx == "Possible Carcinogen ") & ADI_cgx == "low reliability" ~ 0.5,
    (.$CMP_cgx == "NON-Carcinogen " | .$CMP_cgx == "Possible NON-Carcinogen ") & ADI_cgx == "low reliability" ~ 0.5,
    (.$CMP_cgx == "NON-Carcinogen " | .$CMP_cgx == "Possible NON-Carcinogen ") & ADI_cgx == "moderate reliability" ~ 0.3,
    (.$CMP_cgx == "NON-Carcinogen " | .$CMP_cgx == "Possible NON-Carcinogen ")  & ADI_cgx == "good reliability" ~ 0.1,
    (.$CMP_cgx == "NON-Carcinogen " | .$CMP_cgx == "Possible NON-Carcinogen ") & ADI_cgx == "EXPERIMENTAL value" ~ 0.0))

```


#Summary files


## Mutagenicity


```{r}
xlsx::write.xlsx(test_muta, "./Compiled_tox_output.xlsx", sheetName="test_muta", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(lazar_muta, "./Compiled_tox_output.xlsx", sheetName="lazar_muta", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(eq_vega_mut, "./Compiled_tox_output.xlsx", sheetName="eq_vega_mut", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)

test_muta_out_smry <- test_muta %>% select(ID, Compound, CASRN, SMILES, mutagen_test)
lazar_muta_smry <-lazar_muta %>% select(ID, mutagen_lazar)
eq_vega_mut_smry <- eq_vega_mut %>%  select(ID, mutagen_vega)

mutagens <- left_join(test_muta_out_smry, lazar_muta_smry, by="ID") 
mutagens <- left_join(mutagens, eq_vega_mut_smry, by="ID")

select_vars <- c("mutagen_test",  "mutagen_lazar", "mutagen_vega")
mutagens <- mutagens %>% 
  mutate(mean_muta3 = rowMeans(select(., select_vars)))

xlsx::write.xlsx(mutagens, "./Compiled_tox_output.xlsx", sheetName="mutagens", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)


```

## Carcinogenicity


```{r carci LAZAR summary}
xlsx::write.xlsx(lazar_carc_rodent, "./Compiled_tox_output.xlsx", sheetName="lazar_carc_rodent", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(lazar_carc_rat, "./Compiled_tox_output.xlsx", sheetName="lazar_carc_rat", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(lazar_carc_mouse, "./Compiled_tox_output.xlsx", sheetName="lazar_carc_mouse", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)

lazar_carc_rodent_smry <- lazar_carc_rodent %>% select(ID, carci_lazar_rod)
lazar_carc_rat_smry <- lazar_carc_rat %>% select(ID, carci_lazar_rat)
lazar_carc_mouse_smry <- lazar_carc_mouse %>% select(ID, carci_lazar_mus)



lazar_carci_smry <- left_join(lazar_carc_rodent_smry, lazar_carc_rat_smry, by="ID") 
lazar_carci_smry <- left_join(lazar_carci_smry, lazar_carc_mouse_smry, by="ID")

select_vars <- c("carci_lazar_rod", "carci_lazar_rat",  "carci_lazar_mus")
lazar_carci_smry <- lazar_carci_smry %>% 
  mutate(mean_carci_LAZAR3 = rowMeans(select(., select_vars))) 

xlsx::write.xlsx(lazar_carci_smry, "./Compiled_tox_output.xlsx", sheetName="lazar_carci_smry", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)


```



```{r carci VEGA summary}
xlsx::write.xlsx(vega_carc_ceasar, "./Compiled_tox_output.xlsx", sheetName="vega_carc_ceasar", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(vega_carc_iss, "./Compiled_tox_output.xlsx", sheetName="vega_carc_iss", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(vega_carc_antares, "./Compiled_tox_output.xlsx", sheetName="vega_carc_antares", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
xlsx::write.xlsx(vega_carc_cgx, "./Compiled_tox_output.xlsx", sheetName="vega_carc_cgx", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)

vega_carc_ceasar_smry <- vega_carc_ceasar %>% select(ID, vega_carci_caesar)
vega_carc_iss_smry <- vega_carc_iss %>% select(ID, vega_carci_iss)
vega_carc_antares_smry <- vega_carc_antares %>% select(ID, vega_carci_antares)
vega_carc_cgx_smry <- vega_carc_cgx %>% select(ID, vega_carci_cgx)



vega_carci_smry_1 <- left_join(vega_carc_ceasar_smry, vega_carc_iss_smry, by="ID") 
vega_carci_smry_2 <- left_join(vega_carc_antares_smry, vega_carc_cgx_smry, by="ID")
vega_carci_smry <- left_join(vega_carci_smry_1, vega_carci_smry_2, by="ID")

select_vars <- c("vega_carci_caesar", "vega_carci_iss", "vega_carci_antares", "vega_carci_cgx")
vega_carci_smry <- vega_carci_smry %>% 
  mutate(mean_carci_vega4 = rowMeans(select(., select_vars)))

xlsx::write.xlsx(vega_carci_smry, "./Compiled_tox_output.xlsx", sheetName="vega_carci_smry", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
```


```{r carci LAZAR and VEGA summary }

carci_smry <- left_join(lazar_carci_smry, vega_carci_smry, by="ID") 

select_vars <- c("carci_lazar_rod", "carci_lazar_rat",  "carci_lazar_mus", "vega_carci_caesar", "vega_carci_iss", "vega_carci_antares", "vega_carci_cgx")

carci_smry <- carci_smry %>% 
  mutate(mean_carci = rowMeans(select(., select_vars))) 

vega_out_sub <- vega_out %>% select(ID, Compound, CASRN, SMILES)
carci_smry <- left_join(vega_out_sub, carci_smry, by="ID")

xlsx::write.xlsx(carci_smry, "./Compiled_tox_output.xlsx", sheetName="carci_smry", col.names=TRUE, row.names=TRUE, showNA=FALSE, append=TRUE)
```



# Plots

## Mutagenicity

### Barplot 

```{r}
muta <-
    mutagens %>%
    arrange(-mean_muta3) %>%
    data.table()

muta$ID <-
    factor(
        muta$ID,
        levels = muta$ID
    )



```

```{r}
barplot_muta <- ggplot(muta, aes(x = ID, y = mean_muta3)) +
  geom_bar(stat = "identity") + 
  geom_col(aes(fill = mean_muta3)) + 
  scale_fill_gradient2(low = "green", 
                       high = "red", 
                       midpoint = 0.5) +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
  ggtitle("Schematic plot of in silico mutagenicity analysis") +
  xlab("Compound") + ylab("Mutagenic score") +
  theme(legend.position = "none")

barplot_muta

```

## Carcinogenicity

### Barplot 

```{r}
carc <-
    carci_smry %>%
    arrange(-mean_carci) %>%
    data.table()

carc$ID <-
    factor(
        carc$ID,
        levels = carc$ID
    )
```

```{r}
barplot_carci <- ggplot(carc, aes(x = ID, y = mean_carci)) +
  geom_bar(stat = "identity") + 
  geom_col(aes(fill = mean_carci)) + 
  scale_fill_gradient2(low = "green", 
                       high = "red", 
                       midpoint = 0.5) +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
  ggtitle("Schematic plot of in silico carcinogenicity analysis") +
  xlab("Compound") + ylab("Carcinogenic score") +
  theme(legend.position = "none")

barplot_carci

```